Given a point query Q in multi-dimensional space, K-Nearest Neighbor (KNN)queries return the K closest answers according to given distance metric in thedatabase with respect to Q. In this scenario, it is possible that a majority ofthe answers may be very similar to some other, especially when the data hasclusters. For a variety of applications, such homogeneous result sets may notadd value to the user. In this paper, we consider the problem of providingdiversity in the results of KNN queries, that is, to produce the closest resultset such that each answer is sufficiently different from the rest. We firstpropose a user-tunable definition of diversity, and then present an algorithm,called MOTLEY, for producing a diverse result set as per this definition.Through a detailed experimental evaluation on real and synthetic data, we showthat MOTLEY can produce diverse result sets by reading only a small fraction ofthe tuples in the database. Further, it imposes no additional overhead on theevaluation of traditional KNN queries, thereby providing a seamless interfacebetween diversity and distance.
展开▼